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Abstract 

In  this  paper<  we  present/^approaches  to  multi-level  sequential  logic  synthesis  -  algorithms 
and  techniques  for  the  area  and  performance  optimization  of  interconnected  finite  state 
machine  descriptions. 

Interacting  finite  state  machines  are  common  in  industrial  chip  designs.  While 
optimization  techniques  for  single  finite  state  machines  are  relatively  well  developed,  the 
problem  of  optimization  across  latch  boundaries  has  received  much  less  attention. 
Techniques  to  optimize  pipelined  combinational  logic  so  as  to  improve  area/throughput 
have  been  proposed.  However,  logic  cannot  be  straightforwardly  migrated  across  latch 
boundaries  when  the  basic  blocks  are  sequential  rather  than  combinational  circuits. 

We  present  new  techniques  for  the  exploitation  of  sequential  don’t  cares  in  arbitrary, 
interconnected  sequential  machine  structures.  Exploiting  these  don’t  care  sequences  can 
result  in  significant  improvements  in  area  and  performance.  We  address  the  problem  of 
migrating  logic  across  state  machine  boundaries  so  as  to  make  particular  machines  less 
complex  at  the  possible  expense  of  making  others  more  complex.  This  can  be  useful  from 
both  an  area  and  performance  point  of  view.  We  present  new  optimization  algorithms  that 
incrementally  modify  state  machine  structures  across  latch  boundaries.  We  discuss  the  use 
of  more  global  state  machine  decomposition  and  factorization  algorithms  for  area 
optimization.  Finally,  we  present  experimental  results  using  these  algorithms  on  sequential 
circuits.  x  _ _ 
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Abstract 

In  this  paper,  we  present  approaches  to  multi-level  se¬ 
quential  logic  synthesis  —  algorithms  and  techniques  for 
tke  area  and  performance  optimization  of  interconnected 
finite  state  machine  descriptions. 

Interacting  finite  state  machines  are  conunon  in  in¬ 
dustrial  chip  designs.  While  optimization  techniques 
for  single  finite  state  machines  are  relatively  well  devel¬ 
oped,  the  problem  of  optimization  across  latch  bound¬ 
aries  has  received  much  less  attention.  Techniques  to 
optimize  pipehned  combinational  logic  so  as  to  im¬ 
prove  area/throughput  have  been  proposed.  However, 
logic  cannot  be  straightforwardly  migrated  across  latch 
boundaries  when  the  basic  blocl^  are  sequential  rather 
than  combinational  circuits.  ,  , 

We  present  new  techniques  for  the  exploitation  of  se¬ 
quential  don’t  cares  in  arbitrary,  interconnected  sequen¬ 
tial  machine  structures.  Exploiting  these  don’t  care  se¬ 
quences  can  result  in  significant  improvements  in  area 
and  performance.  We  address  the  problem  of  minating 
logic  across  state  machine  boundaries  so  as  to  make  par¬ 
ticular  machines  less  complex  at  the  possible  expense  of 
making  others  more  complex.  This  can  be  useful  from 
both  an  area  and  performance  point  of  view.  We  present 
new  optimization  algorithms  that  incrementally  modify 
state  machine  sUuctures  across  latch  boundaries.  We 
discuss  the  use  of  more  global  state  machine  decomposi¬ 
tion  and  factorization  algorithms  for  area  optimization. 
Finally,  we  present  experimental  results  using  these  al¬ 
gorithms  on  sequential  circuits. 

1  Introduction 

Interacting  finite  state  machines  (FSMs)  are  common  in 
chips  being  designed  today.  The  advantages  of  a  hier¬ 
archical,  distributed-style  specification  and  realization 
are  many.  While  the  terminal  behavior  of  any  set  of  in¬ 
terconnected  sequential  circuits  can  be  modeled  and/or 
re^dized  by  a  lumped  circuit,  the  former  can  be  consider¬ 
ably  more  compact,  as  well  as  being  easy  to  understand 
and  manipulate. 

The  disadvantages  of  this  form  of  specification  from 
a  CAD  point  of  view  are  that  sequential  logic  synthesis 
algorithms  are  generally  restricted  to  operate  on  lumped 
circuits.  State  assignment  algorithms  (e.g.  [1],  18],  [3]), 
for  instance,  almost  exclusively  operate  on  single  finite 
state  machines.  Given  a  set  of  interacting  machines 


represented  by  State  Transition  Graphs,  algorithms 
that  encode  the  internal  states  of  the  machines,  taking 
into  account  their  interactions,  do  not  exist  to  date.  If 
indeed,  the  machines  are  encoded  separately,  disregard¬ 
ing  their  interconnectivity,  a  sub-optimal  state  assign¬ 
ment  can  result  (and  generally  does). 

Traditionally,  the  decomposition  of  an  initial  circuit 
specification  into  smaller,  interacting  sequential  circuits 
has  been  performed  by  the  logic  designer.  Once  a 
decomposition  has  been  performed,  it  is  almost  never 
changed  and  logic  synthesis  tools  operate  on  separate 
logic  blocks  independently.  Unfortunately,  there  are  no 
guarantees  regarding  the  quality  of  the  initial  decom¬ 
position,  in  terms  of  minimality  of  communication  be¬ 
tween  the  machines  and/or  complexities  of  the  individ¬ 
ual  machines.  There  exist  automatic  techniques  that 
can  decompose  lumped  sequential  circuits  into  smaller, 
interacting  ones  (e.g.  [5]).  These  techniques  are  limited 
in  the  topolo^  of  interconnections  that  can  be  achieved 
and  severely  limited  in  their  capabilities  of  handling  cir¬ 
cuits  of  large  size.  Flattening  the  initial,  distributed 
specification  can  result  in  a  very  large  lumped  circuit. 

Efficient  and  flexible  algorithms  for  re-partitioning  in¬ 
teracting  sequential  circuits  for  area  and  performance 
optimization  have  not  been  proposed  in  the  past.  Work 
heis  been  done  in  re-partitioning  pipelined  combina¬ 
tional  logic  stages  (e.g.  [6]).  There  is  no  restriction 
on  migrating  logic  across  latch  boundaries  when  the  ba¬ 
sic  blocks  are  combinat'^nal,  provided  the  latches  are 
not  observable  —  the  tionality  of  the  circuit  is  un¬ 
changed  by  moving  say,  one  gate  from  before  to  after 
a  latch.  However,  when  sequential  circuits  are  inter¬ 
connected,  as  shown  in  Figure  1,  one  cannot  arbitrarily 
move  logic  across  pipeline  latch  boundaries  (We  refer  to 
flip-flops  that  store  state  as  state  latches  and  flip-flops 
that  store  intermediate  values  as  pipeline  latches).  The 
functionality  and  terminal  behavior  of  the  circuit  will 
be  changed,  even  thou^  the  latches  are  not  observable. 

One  wishes  to  be  able  to  migrate  logic  across  pipeline 
latch  boundaries  for  several  reasons.  The  duration  of 
the  system  clock  has  to  be  greater  than  the  longest 
path  between  any  two  pipeline  stages.  If  a  machine, 

A,  is  significantly  more  complex  than  another  machine 

B,  the  critical  path/system  clock  may  be  unnecessarily 
long.  The  clock  cycle  could  be  shortened  by  making  A 
less  complex  at  the  possible  expense  of  making  B  more 
complex.  In  the  best  case,  the  complexities  of  both  A 
and  B  would  decrease. 

Another  very  important  issue  is  the  specification  and 
exploitation  of  il  n  i  cares  in  interconnected  FSM  de¬ 
scriptions.  For  '  \  iiiiple,  in  Figure  1,  certain  binary 
combinations  may  never  appear  at  the  set  of  latches 
LI.  This  will  correspond  to  an  incompletely  specified 
machine  B.  These  don’t  cares  can  be  exploited  us¬ 
ing  standard  state  minimization  strategies  [9].  A  more 
complicated  form  of  don’t  care,  referred  to  here  as  a  se- 


Figure  1:  Interacting  Finite  State  Machines 


quential  don't  care,  corresponds  to  an  input  sequence  of 
vectors,  say  1111,  1011,  1000  that  does  not  appear  at 
LI,  thou^  each  of  the  separate  vectors  do  appear.  Se¬ 
quential  don’t  cares  are  more  difficult  to  exploit.  These 
don’t  cares  are  due  to  the  limited  controllability  of  B 
and  can  be  used  to  optimize  B.  There  are  also  other 
don’t  cares  related  to  the  limited  observability  of  A. 

In  this  paper,  we  present  new  algorithms  for  the  sys¬ 
tematic  exploitation  of  sequential  don’t  cares  resulting 
from  the  limited  observability  of  a  driving  machine  and 
the  limited  controllability  of  a  driven  machine.  We  show 
that  exploiting  either  set  of  don’t  cares  can  significantly 
reduce  the  number  of  states  and  complexity  of  the  driv¬ 
ing  and  driven  machines.  A  set  of  interacting  machines 
can  be  iteratively  optimized  using  these  don’t  care  sets. 

We  also  present  new  techniques  for  the  area  and  per¬ 
formance  optimization  of  interacting  machines,  via  the 
migration  of  lojgic  across  latch  boundaries.  If  a  machine 
A  drives  machine  B,  our  techniques  can  be  used  to  re¬ 
duce  the  number  of  states  and  complexity  of  A  at  the 
possible  expense  of  increasing  the  complexity  of  B  (the 
number  of  states  in  B  remains  constant).  Similarly,  the 
number  of  states  in  B  can  be  reduced  using  comple¬ 
mentary  techniques.  Re-encoding  algorithms  that  mini¬ 
mize  the  areas  of  A  and  B,  by  changing  the  encoding  of 
the  intermediate  lines,  have  dso  been  developed.  These 
techniques  are  incremental,  fast  and  have  small  mem¬ 
ory  requirements.  They  can  be  used  to  speed  up  the 
system  clock  and/or  minimize  area,  in  conjunction  with 
the  algorithms  for  don’t  care  exploitation.  We  present 
experimental  results  on  several  examples  that  illustrate 
the  efficacy  of  the  proposed  algorithms. 

Basic  definitions  and  notations  used  are  given  in  Sec¬ 
tion  2.  Different  types  of  sequential  don’t  cares  are  de¬ 
scribed  in  Section  3.  Systematic  methods  for  the  ex¬ 
ploitation  of  these  don’t  cares  are  presented.  Migration 
of  logic  across  latch  boundaries  is  the  subject  of  Sec¬ 
tion  4.  When  a  machine  A  receives  inputs  from  another 
machine  B,  modifications  to  the  intermediate  lines  that 
carry  information  from  B  to  A  can  change  the  complex¬ 
ities  of  A  and  B.  In  Section  5,  we  present  preliminary 
experimental  results  using  these  techniques  on  some  ex¬ 
amples. 

2  Preliminaries 

A  cube  in  the  Boolean  n-space  corresponding  to  a  logic 
function  is  written  ns  a  bit  vector  on  a.set  of  variables 
with  each  bit  position  representing  a  distinct  variable. 


The  values  taken  by  each  bit  can  be  1,  0  or  2  (don’t 
care),  signifying  the  true  form,  negated  form  and  non-^^ 
existence  respectively  of  the  variable  corresponding  tca^ft 
that  position.  A  minterm  is  a  cube  with  only  0  and 

entnes.  The  distuce  between  two  nunterms  is  defined 
as  the  number  ofbit  positions  they  diner  in.  A  cube  ci 

is  said  to  cover  another  cube  cj  (written  as  ci  3  C2) 
if  for  each  bit  position,  the  entry  in  ci  is  equal  to  the 
entry  in  cj  or  is  a  2. 

A  finite  state  machine,  M,  is  represented  by  its  State 
Transition  Graph  (STG),  G{V,E,W{E))  where  V  is 
the  set  of  vertices  corresponding  to  the  set  of  states  Su, 
where  ||5jif|l  is  the  cardinality  of  the  set  of  states  of  the 
FSM,  E  the  set  of  transition  edges  in  M  and  W{E)  are 
the  Boolean  expressions  corresponding  the  input  and 
output  combinations  for  E.  The  number  of  inputs  and 
outputs  are  denoted  Nj  and  No  respectively.  The  input 
combination  and  present  state  corresponding  to  an  edge 
are  denoted  (i,  s)  6  E,  where  i  and  s  are  cubes.  The 
fanin  and  output  of  (i,  s)  are  denoted  fanin(i,  s)  €  V 
and  output{i,  s)  respectively.  The  complete  set  of  fanin 
and  fanout  edges  of  a  state  s  are  denoted  fanin{s)  and 
fanout{s).  The  fanin  state,  fanout  state  and  output 
of  an  edge  ci  are  denoted  ci  —  >  fanin,  ei-  >  fanout 
and  Cl-  >  output  respectively.  The  set  of  fanin  (fanout) 
edges  of  a  state,  q,  is  denoted  Epiiq)  {Epaiq))- 

A  starting  or  initial  state  is  assumed  to  exist  for 
a  machine,  M,  also  called  the  reset  state  and  denoted 
R\i.  A  distinguishing  sequence  for  a  pair  of  states 
9i<  92  €  Su  is  a  sequence  of  input  vectors  such  that  the 
last  vector  produces  different  outputs  when  the  sequence 
is  applied  to  M,  when  M  is  initially  in  91  or  when  M 
is  initially  in  92.  Two  states  qi,  q2  in  a  machine  M  ar^^ 
equivalent  (written  as  91  =92).  if  they  do  not  posses^^B 
a  distinguishing  sequence. 

A  differentiating  sequence  for  a  pair  of  states 
91.  92  €  Sm  is  a  sequence  of  input  vectors  such  that 
some  vector  (or  vectors)  in  the  sequence  produces  differ¬ 
ent  outputs  when  the  sequence  is  applied  to  M  initially 
in  qi  or  initially  in  92  and  at  the  end  of  the  sequence 
M  reaches  the  same  final  state.  The  pair  of  edges  cor- 
re^onding  to  each  input  vector  in  a  distinguishing  or 
differentiating  sequence  are  called  co-edges. 

A  sequence  of  vectors  KSi  is  said  to  contain  another 
sequence  VSj  (written  as  V^5i  D  1^52),  if  ^^52  appears 
in  K5i. 

^  A  cascade  of  twoi  machines,  A  and,  B  is  denoted 
A  — ►  B.  A  is  the  driving  machme  and  B  the  driven 
machine. 

3  Sequential  DonH  Cares 

In  Figure  1,  we  have  a  machine  A  driving  another  ma¬ 
chine  B  via  a  set  of  latches  LI  (We  neglect  C  for  the 
moment).  For  the  purposes  of  the  discussion  here,  we 

^ume  that  all  the  latches  in,  LI,  are  not  observable. 

In  practice,  a  subset  01  the  latches  may  be  observ¬ 
able.  However,  the  don’t  care  exploitation  techniques 
described  here  are  easily  modified  to  the  general  case. 

We  assume  that  a  State  Transition  Graph  description 
exists  for  both  machines  A  and  B.  Let  the  number  of  in¬ 
termediate/ pipeline  latches  in  LI  be  TV.  A  may  or  may 

not  assert  all  2^  possible  output  combinations.  If  a  cer¬ 
tain  binary  combination,  ci  never  appears  at  LI,  then 
B  will  be  incompletely  specified  -  the  transition  edge 
corresponding  to  an  input  of  ci  need  not  be  specified 
whatever  state  B  is  in  (We  don’t  care  what  happens 
when  B  receives  the  input  ci).  The  more  general  case 


-0/1  -0/0 


Figure  2:  Sequential  Don’t  Cares 


is  when  a  certain  combination  cj  never  appears  at  LI, 
when  B  is  in  some  set  of  states  Qb  €  Sb  {Sb  is  the 
set  of  all  states  B  can  be  in).  It  does  appear  when  B  is 
in  states  other  than  Qb-  In  this  case,  the  states  in  Qa 
will  have  cj  unspecified  (If  an  edge  on  co  exists  in  Qa> 
it  can  be  removed).  This  type  of  don’t  care  can  be  eas¬ 
ily  exploited  via  the  use  of  standard  state  minimization 
algorithms  that  handle  incompletely  specified  machines 
[9]. 

A  more  complicated  sequential  don’t  care  is  associ¬ 
ated  with  vector  sequences  that  never  appear  at  LI, 
though  all  2^  separate  vectors  appear.  A  does  not 
produce  all  possible  output  sequences.  This  type  of 
don’t  care  does  not  have  a  straightforward  interpreta¬ 
tion.  Edges  in  the  State  Transition  Graph  of  B  cannot 
be  removed  or  left  unspecified.  In  Figure  2,  a  State 
Transition  Graph  corresponding  to  a  possible  B  ma¬ 
chine  is  shown. ,  The  machine  is  state  min irnal.  ,  We 
assume  that  each  transition  edge  in  B  is  irredundant, 
i.e.  B  makes  every  transition  with  appropriate  input 
sequences.  A  don’t  care  input  sequence  is  shown  be¬ 
low  the  Graph.  Such  a  don’t  care  sequence  implies  that 
certain  sequences  of  transitions  will  not  be  made  by  B- 

A  don’t  care  input  sequence  is  assumed  to  have  a 
length  greater  than  1.  Given  a  don’t  care  sequence  DC, 
all  sequences  5E  such  that  SE  D  DC  are  also  don’t  care 
sequences.  We  define  an  atomic  don’t  care  sequence 
as  one  that  does  not  contain  any  other  don’t  care  se¬ 
quence.  Thus,  any  subsequence  of  an  atomic  don’t  care 
sequence  is  a  care  sequence.  In  the  sequel,  we  consider 
only  atomic  don’t  care  sequences. 

Given  a  set  of  sequences  that  a  driving  machine  never 
asserts,  our  problem  lies  in  exploiting  this  form  of  don't 
care,  so  as  to  optimize  B.  In  the  general  case,  we  will 
have  a  set  of  don’t  care  sequences.  We  can  state  the 
following  lemma. 

Lemma  3.1  ;  Given  a  machine  B  and  a  set  of  don’t 
care  sequences  DCj  ,  \  <  i  <  Nj^,  if  two  states  in  B, 
si  and  s2  have  distinguishing  sequences  L  ,  1  <  i  <  Np 
such  that  for  each  k,  /*  D  DCi  for  some  /,  then  si  and 
s2  are  equivalent  in  B  under  the  DCj. 

Proof:  Since  the  DCj  can  never  occur,  it  means  the  /, 
can  never  occur.  Therefore,  si  and  s2  in  B  are  equiva- 
'ent  under  DC;.  Q.E.D. 


An  approach  to  exploit  don’t  cares  based  on  Lemma 
3.1  would  entail  producing  all  distinguishing  sequences 
for  every  pair  of  states  in  B  and  checking  for  the  con¬ 
tainment  condition.  Pairs  satisfying  the  condition  can 
be  merged.  This  is  potentially  very  time  consuming;  a 
pair  of  states  may  have  many  distinguishing  sequences 
and  we  have  to  find  them  for  every  possible  pair.  A 
more  efficient  approach  is  now  outlined. 

In  this  approach,  given  a  set  of  don’t  care  sequences, 

B  is  transforined  into  a  new  ^chine  B!  which  h^ 
a  greater  number  of  states,  but  is  more  incompletely 

specified  than  B.  B'  is  state  minimized  to  obtain  B" 
(ll•5fl"ll  £  II^bID-  The  pseudo-code  below  illustrates 
the  procedure. 

exploit-input-dc(  B,  DC  ): 

^  B'  =  B  ; 

foreach  (  don’t  care  sequence  DCi  )  { 

foreacn  (depth-first  path  P  =  ei,  ..  ck  G  B'  )  { 
iff  P3  DC.  )  { 
for(  i  =  2;  i  <  K\  i  =  i  -|-  1  )  { 

Si  =  e,—  >  fanout  , 
make  states  s|  and  s"  ; 
fanin(s^)  =  e,_i  ; 
fanin(si)  =  fanin(si)  -  e,_i  : 
if  (  /anin(sj')  =  d>  )  delete  s"  ; 

if^  <  A-) 

fanout{Si)  =  fanout{s") 

=  fanout(si)  ; 
else  { 

fanoutls'A  =  fanout(si)  —  e<_i  ; 
fanout{s\  )  =  fanout(si)  ; 

\  1 

delete  s.  ; 


B"  =  state-minimize  (  B'  )  ; 


The  procedure  is  effectively  producing  a  machine 
where  the  don’t  care  sequences  are  not  specified,  but 
otherwise  has  the  same  functionality  as  the  original  ma¬ 
chine.  This  means  that  if  any  two  states  in  B  satisfy 
the  conditions  of  Lemma  3.1,  these  two  states  will  not 
possess  a  distinguishing  sequence  in  B'  and  will  thus 
be  compatible  during  state  minimization.  A  smaller 
machine  B”  will  be  obtained  after  state  minimization. 
When  i  =  p  <  K  in  the  for  loop  above,  the  fanout 
of  Sp  is  duplicated  for  the  states  sj,  and  s"  -  the  edge 
Cp  is  also  duplicated.  Hence,  at  the  next  iteration,  one 
of  the  Cp  fans  into  Sp^j  and  the  other  Cp  (as  well  as  the 
remaining  fanout  edges  from  sj,  and  s")  into  Sp+i 

An  illustrative  example  is  given  in  Figures  2,  3  and 
4.  The  machine  and  the  don’t  care  sequence  of  Figure  2 
produce  an  expanded  machine,  shown  in  Figure  3.  State 
minimizing  this  machine  produces  the  result  of  Figure 
4,  which  has  one  less  state  than  the  original  machine  of 
Figure  2.  States  s3',  s4'  and  s4"  merge  and  so  do  si 
and  52. 

The  sequential  don’t  cares  discussed  thus  far  are  a 
product  of  the  constrained  controllability  of  the  driven 
machine  B  in  a  cascade  A  — >  B.  There  is  another 
type  of  don’t  care  due  to  the  constrained  observability 
of  the  driving  machine  A.  We  focus  on  the  individually 
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Figure  3:  Expanding  the  Original  Machine 


Figure  4:  Machine  after  State  Minimization 


state  minimized  tables  of  Figure  5.  The  intermediate 
inputs/outputs  have  been  given  symbolic  codes.  Given 
that  A  feeds  into  B,  it  is  quite  possible  that  for  some 
transition  edge  ei  €  A,  it  does  not  matter  if  the  output 
asserted  by  this  particular  transition  edge  is,  say,  INTi 
OT  I  NT j.  In  fact,  in  Figure  5,  the  3rd  transition  edge 
can  be  either  INTI  or  INT2,  without  changing  the  ter¬ 
minal  behavior  of  A  — ►  B  (We  assume  that  there  are 
no  latches  between  A  and  B,  the  starting  state  of  A  is 
sal  and  the  starting  state  of  B  is  qhl).  This  is  a  don’t 
care  condition  on  A’a  outputs.  It  is  quite  possible  that 
making  use  of  these  don’t  cares  can  reduce  the  number 
of  states  in  A.  In  fact,  if  one  replaced  the  output  of  the 
3rd  edge  in  A  (Figure  5)  by  INTi  instead  of  INT2, 
we  would  obtain  one  less  state  after  state  minimization 
{sa2  becomes  equivalent  to  so5). 

Given  a  cascade  A  — ►  B,  we  give  below  a  systematic 
procedure  to  detect  this  type  of  don’t  care,  i.e.  ex¬ 
pand  the  output  of  each  transition  edge  of  A  to  the  set 
of  ail  possible  values  that  it  can  take  while  maintain¬ 
ing  the  terminal  behavior  of  A  — ►  B.  Standard  state 
minimization  procedures  can  exploit  don’t  care  outputs, 
represented  as  cubes.  However,  state  minimization  pro¬ 
cedures  have  to  be  modified  in  order  to  exploit  transition 
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Figure  5:  Output  Expansion 


edge  outputs  represented  as  arbitrary  Boolean  expres¬ 
sions  (multiple  cubes). 


output-expansion(  A,  B  ); 


{ 


foreach  (  edge  ci  6  ^4  )  { 

OUT{e\)  —  universe  ; 
foreach  (  state  91  £  5b  )  { 

if  (  B  can  be  in  91  as  X  asserts  ci  )  { 
find  largest  set  of  output  combinations 
Cl  3  Cl  3ei-  >  output  Scii  fanin(ci,  91), 
outputlci,  9i)  are  unique  ; 

OUT(ei)  =  OUT{er)  H  ci  ; 

}  ^ 

Cl—  >  output  =  OUT{ex)  ; 


A  transition  edge  ci  in  A  is  picked.  The  set  of  states 
that  B  can  be  in  when  A  makes  this  transition  is  found. 
Given  thb  set  of  states,  the  largest  cube  (or  set  of  out¬ 
put  combinations)  that  covers  the  output  of  the  edge 
and  produces  a  unique  next  state  and  a  unique  output 
when  B  is  in  any  one  of  the  possible  states  is  found  (cor¬ 
responds  to  OUT{e\)).  The  output  of  ei  is  expanded 
to  the  cube.  The  process  is  repeated  for  all  edges  in  A. 

The  state  minimization  procedure  proposed  in  [9] 
can  be  used  for  incompletely  specified  finite  state  ma¬ 
chines.  However,  after  output  expansion,  we  may  have  a 
multiple-output  FSM  in  which  a  transition  edge  has  an 
output  that  can  belong  to  a  subset  of  symbolic  or  binary 
values,  rather  than  the  universe  of  possible  values  (as  in 
the  incompletely  specified  case).  In  the  case  of  multi¬ 
ple  cubes  or  Boolean  expressions  specifying  the  output 
combinations  for  fauout  edges,  an  additional  check  has 
to  be  performed  during  otate  minimization  during  the 
selection  of  the  compatibility  pairs  to  see  if  three  or 
more  sets  of  states  can,  in  fact,  be  merged,  preserving 
functionality.  This  .is  because  the  pairwise  intersection 
of  the  Boolean  expressions  corresponding  to  the  fanout 
edges  of  these  states  may  each  be  non-null,  resulting  in  a 
compatibility  relation  between  each  pair  of  states,  but 
the  three-way  intersection  may  be  null,  implying  that 
the  three  states  cannot  be  merged. 

When  we  have  a  set  of  interconnected  machines  as  in 
Figure  1,  the  don’t  cares  corresponding  to  each  cascade 
can  be  iteratively  used.  For  instance,  in  Figure  1,  A 
drives  B.  The  outputs  of  A’s  edges  can  be  expanded 
first.  A’s  output  don’t  care  sequences  can  be  used  to 
optimize  B.  Next,  one  can  focus  on  B  — ►  C.  Output 
expansion  can  be  performed  on  B  and  so  on. 


4  Optimization  Across  Latch 

Boundaries 

4.1  Introduction 

A  set  of  interacting  machines  can  be  optimized  using 
their  2issociated  don’t  cares  as  described  in  the  previous 
section.  If  the  initial  decomposition  is  not  an  intelUgent 
one,  there  will  be  a  large  set  of  don’t  cares  associated 
with  each  pair  of  driving  and  driven  machines.  While 
exploiting  don’t  cares  has  the  effect  of  removing  redun' 
dancy,  the  overall  decomposition  of  logic  functionality 
between  the  various  circuits  remains  the  same.  As  men¬ 
tioned  earlier,  there  are  several  attractions  in  being  able 
to  migrate  logic  from  one  meu:hine  to  another.  In  this 
section,  we  will  present  incremental  techniques  that  op¬ 
timize  cascaded  pairs  of  machines  via  logic  migration. 
These  techniques  are  iteratively  apphed  in  the  general 
case  of  interacting  machines  (like  in  Figure  1). 

4.2  Re-encoding 

Consider  the  cascaded  pair  of  Figure  5.  The  intermedi¬ 
ate  line  values  have  been  represented  by  symbolic  codes. 
The  complexities  of  the  machines  are  affected  by  the  en¬ 
coding  of  these  lines.  A  good  output  encoding  for  A  will 
produce  minimal  complexity.  However,  a  good  output 
encoding  for  A  may  not  be  a  good  input  encoding  for  B 
and  vice  versa.  Thus,  tradeoffs  exist. 

We  propose  re-encoding  as  a  means  of  migrating  logic 
between  the  two  machines  by  exploring  these  tradeoffs. 
If  the  initial  specification  of  the  intermediate  lines  is 
binary  (rather  than  symbolic),  the  specification  is  con¬ 
verted  into  a  symbolic  representation.  For  instance,  one 
might  view  the  machines  of  Figure  5  as  being  derived 
from  a  logic  implementation  where  INT\  had  a  code 
100,  INT2  had  a  code  010  and  so  on.  We  can  re-encode 
these  lines  in  different  ways  to  tune  the  complexities  of 
A  and  B.  Re-encoding  can  be  performed  before  or  after 
state  aissignment. 

If  one  wished  to  reduce  B’s  complexity,  the  intermedi¬ 
ate  symbolic  implicants  would  be  assigned  binary  values 
corresponding  to  an  optimal  input  encoding.  Strate¬ 
gies  for  optimal  input  encoding  have  been  proposed  (8). 
Heuristics  for  output  encoding  to  reduce  A’s  complexity, 
as  in  [7],  can  also  be  used. 

It  has  been  determined  experimentally  that  re¬ 
encoding  affects  the  relative  complexities  of  the  ma¬ 
chines  by  tis  much  as  25%.  However,  the  number  of 
states  in  the  machines  is  unchanged. 

4.3  Optimizing  the  Driven  Machine 

Consider  again  the  cascaded  pair  of  Figure  5.  The  sym- 
bohc  impUcants  INTi  constitute  the  means  of  informa¬ 
tion  flow  from  A  to  B.  It  is  conceivable  that  for  some 
pair  of  states  (?i,  q^)  6  S,  a  particular  input  vector 
INTx  is  required  as  the  first  vector  in  each  distinguish¬ 
ing  sequence  for  the  pair  (For  instance,  INTI  is  required 
to  distinguish  ^61  and  ^63  in  Figure  5).  If  one  were  to 
modify  A  so  as  to  produce  INTx'  ^  INTx  when  B  is 
in  qi  and  INTx  otherwise,  the  distinguishing  sequences 
are  invalidated.  q\  becomes  equivalent  to  q^  and  B  can 
be  reduced.  This  is  the  basic  process  behind  the  tech¬ 
nique  described  in  this  section. 

The  algorithm  identifies  symbolic  implicants  which 
when  split  up  result  in  state  reductions  in  B.  The  num¬ 


ber  of  states  in  A  remains  constant.  The  complexity  of 
A  may  increase,  since  A  now  asserts  a  larger  number 
of  distinct  symbolic  outputs.  Even  if  a  particular  sym¬ 
bolic  implicant  appears  in  front  of  every  distinguishing 
sequence  of  a  pair  of  states  in  B,  it  is  not  always  possi¬ 
ble  to  reduce  the  number  of  states  in  B.  The  following 
theorem  is  a  statement  of  the  required  conditions. 

Theorem  4.1  ;  Given  a  cascade  A  — »  B,  let  the  dis¬ 
tinguishing  sequences  for  a  pair  of  states  qi,  92  €  Qb 
he  DS\,  DS2,  ■■  DShf.  Let  the  distinct  first  vec¬ 
tors  in  the  DSi  he  oi,  02,  ..  ojv-  When  B  is  in  91 
(q2),  let  the  possible  transition  edges  that  A  has  just 
made  he  B=«,)  (E(a,  B=nt))-  Eji  €  E^a,  b=j,) 
and  Ej2  €  E(^a,  b=13)  o’"*  ^he  sets  of  edges  that  assert 
Oj,  V  j.  If  Eji  n  E^  =  4>,  I  <  j  <  then  91  and 
92  con  he  merged  in  B. 

Proof:  We  make  the  outputs  of  Ej\  o' j  ^  Oi  (and  dis¬ 
tinct  from  all  other  symbolic  implicants).  This  means 
that  when  B  is  in  91  it  will  never  receive  0^  ,  I  <  j  <  N. 
Similarly,  when  B  is  in  92  it  never  receives  o'j,  1  < 
j  <  N .  The  first  vector  in  each  distinguishing  sequence 
DSi,  1  <  *  <  M,  is  invalidated.  Therefore.  91  =  92. 
Q.E.D. 

For  states  other  than  91,  92  6  Qb>  o'j  is  made  to 
produce  the  s?‘me  next  state  and  outputs  as  oy,  V  j. 
N  =  1  is  the  simplest  case  of  state  reduction. 

This  technique  is  essentially  splitting  the  symbolic 
outputs  of  machine  A  and  introducing  new  don’t  care 
sequences  to  B.  These  don’t  care  sequences  are  then 
used  to  reduce  the  complexity  of  B.  The  above  theo¬ 
rem  has  a  straightforward  practical  interpretation.  For 
optimization  purposes,  we  focus  on  symbolic  implicants 
that  appear  most  frequently  as  first  vectors  in  distin¬ 
guishing  sequences  for  different  pairs  of  states.  The  edge 
disjointness  condition  of  Theorem  4.1  is  checked  for  and 
the  implicants  split  if  the  condition  is  satisfied,  so  ais  to 
reduce  the  number  of  states  in  B. 

4.4  Optimizing  the  Driving  Machine 

A  technique  complementary  to  the  technique  described 
in  the  previous  section  can  be  used  to  decrease  the  com¬ 
plexity  of  the  driving  machine,  A.  Here,  states  in  the 
driven  machine  B  are  split.  Splitting  these  states  in 
B  results  in  new  degrees  of  freedom  in  expanding  the 
outputs  of  the  edges  in  A.  Output  expansion  results  in 
reducing  the  number  of  states  and  the  complexity  of  the 
driving  machine  A. 

Again,  the  symbolic  output  implicants  of  A  cannot 
be  arbitrarily  merged,  since  one  has  to  maintain  the 
terminal  behavior  of  A  — ►  B.  The  following  theorem 
is  a  statement  of  the  conditions  required  for  implicant 
merging  to  be  possible. 

Theorem  4.2  .■  Given  a  cascade  A  — >  B.  let  a  tran¬ 
sition  edge  e  G  A  assert  the  symbolic  output  Op.  When 
A  makes  the  transition  e,  let  the  possible  states  B  can 
be  in  be  Q(b,  A\t'r 

V  ?  €  Q(b,  .4|«)  3  o,'  I  (  /anm(o/,  9)  = 

/anjn(op,  9)  ick.  outpu<(o,',  9)  =  output{op.  9)  ) 

II  (  Efiiq)  n  E^b.  ytiB,*)  )  n 

(  ^Fl{g)  O  E(b,  A\e)  )  =  d><  ^ 

then  e—  >  output  can  be  expanded  to  (o,*,  Op)  E,* 


is  the  set  of  transition  edges  asserting  output  o^*, 
xiB,*)  (E(b.  A\t))  «  of  tran5itton5  B  can 

make  when  A  is  making  the  transitions  E^*  (e ). 

Proof;  We  split  each  q  €  Q{b,  A|e)  for  which  an 
o,'  cannot  be  found  such  that  (  fanin(Of*,  q)  = 
fanin(op,  Sck  output{o.*,  q)  =  output{op,  q) ),  into 
two  states  q'  and  9",  initially  duplicating  the  ihnout  of 
q.  q'  receives  as  fanin  Epiiq)  —  (  EFii<l)  ^  A|e)  ) 
and  q"  receives  (  Epiiq)  H  E^b.  A|e)  )•  When  B  is  in  q", 
it  never  receives  Oj*  from  A.  Those  fanout  edges  from 
q"  can  be  deleted.  This  means  that  the  condition  for 
expanding  the  outputs  of  edge  e  to  (o^ Op)  is  satisfied 
(Section  3).  Q.E.D. 

Splitting  states  in  B  has  the  effect  of  introducing  new 
output  don’t  cares  for  A,  which  can  reduce  the  complex¬ 
ity  of  A.  If  for  each  differentiating  sequence  for  a  pair  of 
states  qi,  q2  £  Sa^  each  pair  of  co-edges,  ei .  62  that  as¬ 
sert  diiTerent  outputs  are  expanded  so  cj—  >  output  O 
62—  >  output  or  62—  >  output  D  61—  >  output,  then 
qi  and  92  can  be  merged  in  A. 

The  strategy  used  in  optimization  is  to  remove  a  par¬ 
ticular  symbolic  output  in  A  by  operating  on  all  edges 
that  assert  this  particular  output. 

1.  The  STG  of  ^  is  analyzed  to  find  which  of  the 
symbohc  outputs  appear  (as  last  vectors)  in  most 
differentiating  sequences.  One  particular  symbolic 
output  is  picked,  namely  Op. 

2.  All  the  transitions  in  A,  Ep,  that  assert  Op  are 
found.  For  each  e  €  Ep,  the  set  of  states  B  can  be 
in,  after  A  makes  transition  e,  <3(a  x|e).  is  found. 

3.  The  fanouts  of  states  in  Q(b,  a|«)  are  analyzed  to 
pick  a  symbolic  output  ^  Op  that  produces  the 
same  next  states  and  outputs  as  Op,  in  a  maximum 
number  of  Q(b,  A\e)- 

4.  For  each  q  £  Q{b,  a|«)  for  which  and  Op  produce 
different  next  states  or  outputs,  the  fanin  of  ?  is 
checked  for  a  possible  split  as  per  Theorem  4.2.  If 
so,  go  to  Step  2  and  pick  a  new  edge  e.  Else,  go  to 
Step  4  and  pick  a  new  o,'.  If  all  possible  o,*  have 
been  exhausted,  go  to  Step  1  and  pick  a  new  Op. 

5.  Split  states  in  B  corresponding  to  the  selected  Op 
and  Oj*,  V  e  €  Ep. 

While  this  algorithm  does  not  guarantee  reduction  in 
the  number  of  states  in  A,  it  ^arantees  reduction  in  A's 
complexity  on  completion,  since  A  now  asserts  a  fewer 
number  of  symbolic  outputs.  Generally,  a  reduction  in 
states  is  also  obtained. 

4.5  Partial  Collapsing 

When  the  driven  machine  5  in  a  cascade  A  — ►  B  has 
multiple  outputs,  its  complexity  can  be  reduced  by  col¬ 
lapsing  or  flattening  one  or  more  outputs  into  A. 

An  output  of  B  is  selected  and  two  separate  machines 
B'  and  B"  operating  in  parallel  are  constructed,  with 
B'  producing  the  single  selected  output  and  B"  the  re¬ 
maining.  The  STG  of  B  can  be  initially  duplicated  for 
B'  and  B"  and  then  re-minimized  after  removing  the 
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Table  1:  Statistics  of  Examples 


appropriate  outputs.  Both  B"  and  B'  will  be  less  com¬ 
plex  than  B.  B'  can  then  be  collapsed  into  A\  a  new 
machine  corresponding  to  the  direct  product  of  A  and 
B'  will  be  obtained,  that  drives  B".  If  latches  exist 
initially,  between  A  and  B  then  flattening  is  more  com¬ 
plicated,  since  a  latch  itself  represents  a  two-state  finite 
state  machine.  The  product  of  A,  the  latches  and  B' 
has  to  be  constructed. 

Partial  collapsing  will  result  in  a  reduction  of  com¬ 
plexity  in  the  driven  machine  in  a  cascade,  but  can 
significantly  increase  the  complexity  of  the  driving  ma¬ 
chine.  It  has  limited  uses,  but  is  applicable  in  cases 
where  the  driven  machine  is  significantly  more  complex 
than  the  driving  machine. 

The  flattened  machine  can  be  re-decomposed  in  a  cas¬ 
cade  using  the  classical  decomposition  algorithms  of  [5]. 
General  decomposition  algorithms  have  been  recently 
proposed  [4],  that  produce  two  interacting  submachines 
A  < — ►  B  from  the  original  description,  attempting  to 
minimize  the  complexities  of  the  submar  hines.  These  al¬ 
gorithms  are  more  powerful  than  those  in  [5],  since  th« 
interaction  between  the  submachines  is  two-way  rather 
than  uni-directional.  Using  these  decomposition  algo¬ 
rithms  allows  more  global  optimization  at  the  expense 
of  loss  of  control  over  the  optimization  and  the  ability 
to  handle  large  circuits. 

5  Results 

We  have  run  several  examples  to  evaluate  the  optimiza¬ 
tion  strategies  and  algorithms  described  in  Sections  3 
and  4.  In  Table  1,  the  statistics  of  the  sequential  circuits 
we  experimented  with  are  given.  All  these  circuits  were 
obtained  by  interconnecting  the  finite  state  machines 
of  the  MCNC  1987  Logic  Synthesis  Workshop  bench¬ 
mark  set.  In  Table  1,  the  number  of  primary  inputs 
(pi)  and  primary  outputs  (po),  the  number  of  separate 
machines  (mac)  and  the  number  of  states  in  each  ma¬ 
chine  in  the  circuit  (states)  are  indicated  for  each  exam¬ 
ple.  The  number  of  intermediate,  non-observable/non- 
controllable  lines  (int)  and  the  total  number  of  literals 
(lit)  after  state  assignment  using  MUSTANG  [3]  and 
multi-level  combinational  optimization  using  MIS  [2]  are 
also  given. 

We  first  give  results  from  using  the  don’t  care  ex¬ 
ploitation  algorithms,  in  Table  2.  For  each  circuit,  the 
number  of  states  in  each  of  the  machines  and  the  total 
literal  count  is  given,  as  well  as  the  CPU  time  required 
for  optimization  on  a  VAX  11/8650  (m  stands  for  min¬ 
utes).  Significant  reductions  in  circuit  complexity  hav& 
been  obtained.  ' 

We  used  the  re-encoding  algorithms  of  Section  4.2 
on  the  cascaded  pairs.  In  Table  3,  the  literal  counts 
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for  each  of  the  two  machines  in  the  circuit  originally 
(orig.)  and  the  extreme  cases  of  re-encoding  (input  and 
output)  are  given.  As  before,  the  literal  counts  are  af¬ 
ter  state  assignment  and  logic  optimization.  As  can  be 
seen,  re-encoding  affects  the  complexity  of  the  individ¬ 
ual  machines  by  as  much  as  25%.  Cascade  exl,  for 
instance,  would  be  best  implemented  using  an  input  en¬ 
coded  driven  machine  so  as  to  make  the  complexities  of 
the  driven  and  driving  machines  comparable. 

Finally,  we  present  results  using  the  logic  migration 
algorithms  of  Section  4.3  and  4.4.  The  states  in  the  in¬ 
dividual  machines  of  a  sequential  circuit  can  be  reduced 
or  increewed  using  these  algorithms.  In  Tables  4  and 
5,  the  number  of  states  in  the  optimized  machines  and 
the  new  literal  counts  are  given  using  the  strategies  of 
Section  4.3  and  4.4,  respectively.  As  with  re-encoding, 
solutions  in  between  these  extremes  can  be  obtained  — 
however,  the  numbers  of  Tables  4  and  5  illustrate  the 
range  in  capabilities  of  the  proposed  algorithms. 

6  Conclusions 

We  presented  algorithms  and  techniques  for  the  area 
and  performance  optiiiiization  of  interconnected  rinite 
state  machine  structures.  These  algorithms  include 
don’t  care  exploitation  techniques  as  well  as  logic  mi¬ 
gration  techniques  across  latch  boundaries  in  interact¬ 
ing  sequential  structures.  The  results  we  have  obtained 
using  these  algorithms  thus  far  are  encouraging. 
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